{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# The NumPy Library\n", "\n", "## Try me\n", "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ffraile/computer_science_tutorials/blob/main/source/Applied%20Mathematics/tutorials/Numpy%20tutorial.ipynb)[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/ffraile/computer_science_tutorials/main?labpath=source%2FApplied%20Mathematics%2Ftutorials%2FNumpy%20tutorial.ipynb)\n", "\n", "The [Numpy](https://numpy.org/) (Numerical Python) is a package of numerical functions to effectively work with multidimensional data structures in Python. In Python, it is possible to work with anidated lists to work with multidimensional structures (arrays and matrix), but this is not efficient. The Numpy library defines the numpy array object to provide an efficient and convenient object to define multidimensional structures.\n", "\n", "To use Numpy in your Notebooks and programs, you first need to import the package (in this example we use the alias np):" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Numpy Array\n", "\n", "The numpy array uses a similar structure to a Python list, although as mentioned above, it provides additional functionalities to easily create and manipulate multidimensional data structures. The data in an array are called elements and they are accessed using brackets, just as with Python lists. The dimensions of a numpy array are called **axes**. The elements within an axe are separated using commas and surrounded by brackets. Axes are also separated by brackets, so that a numpy array is represented as an anidated python list. The **rank** is the number of axis of an array. The **shape** is a list representing the number of elements in each axis. The elements of a numpy array can be of any numerical type." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "My first Numpy array:\n", "[1 2 3 4]\n", "My second Numpy array:\n", "[[1 2 3 4]\n", " [5 6 7 8]]\n", "element in position (1,2) is:\n", "7\n", "Number of dimensions:\n", "2\n", "Shape of array:\n", "(2, 4)\n", "Total number of elements:\n", "8\n" ] } ], "source": [ "a = np.array([1, 2, 3, 4]) #this creates a one dimensional array of size 4\n", "print(\"My first Numpy array:\")\n", "print(a)\n", "b = np.array([[1,2,3,4],[5,6,7,8]]) #This creates a 2-dimensional (rank 2) 2x4 array \n", "print(\"My second Numpy array:\")\n", "print(b) \n", "\n", "#You can use indexing as in arrays: \n", "print(\"element in position (1,2) is:\")\n", "print(b[1,2])\n", "\n", "print(\"Number of dimensions:\")\n", "print(b.ndim) #number of dimensions or rank\n", "\n", "print(\"Shape of array:\")\n", "print(b.shape) #shape (eg n rows, m columns)\n", "\n", "print(\"Total number of elements:\")\n", "print(b.size) #number of elements" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create Numpy Arrays\n", "\n", "Numpy includes several functions for creating numpy arrays initialized with convenient ranks, shapes, or elements with constant or random values.\n", "\n", "**Some examples:**\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1. 1.]\n", " [1. 1.]\n", " [1. 1.]]\n", "[[0. 0. 0. 0.]\n", " [0. 0. 0. 0.]\n", " [0. 0. 0. 0.]]\n", "[0.71574091 0.54968971 0.72723399]\n", "[[12 12]\n", " [12 12]]\n", "[[1. 0. 0.]\n", " [0. 1. 0.]\n", " [0. 0. 1.]]\n" ] } ], "source": [ "o = np.ones((3,2)) # array of 3x2 1s\n", "print(o)\n", "\n", "b=np.zeros((3,4)) # array of 3x4 zeroes\n", "print(b)\n", "\n", "c=np.random.random(3) #array of 3x1 random numbers\n", "print(c)\n", "\n", "d=np.full((2,2),12) # array of 2x2 12s\n", "print(d)\n", "\n", "e = np.random.randint(low=10, high=100, size=(4,4)) # array of shape 4x4 of integer numbers drawn from a discrete uniform distribution in the range 10 - 100.\n", "print(e)\n", "\n", "identity_matrix =np.eye(3,3) # identity array of size 3x3\n", "print(identity_matrix)\n" ] }, { "cell_type": "markdown", "source": [ "## Creating sequences\n", "\n", "Some useful functions for creating lists are **arange**, **linspace** and **random.randint**:\n", "\n", " - **arange(start, end, step)**: creates a numpy array with elements ranging from **start** to **end** incrementing by **step**. Only end is required, using only end will create an evenly spaced range from 0 to end.\n", " - **linspace(start,end,numvalues)**: creates a numpy array with **numvalues** elements with evenly distributed values ranging from **start** to **end**. The increment is calculated by the function so that the resulting number of elements matches the numvalues input parameter.\n", " - **random.randint(low, high, size)**: creates a numpy array of size **size** with integer values selected at random in the interval between low and high-1.\n" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [ "a = np.arange(0, 10, 2)\n", "print(a)\n", " \n", "b=np.linspace(0,10,6)\n", "print(b)\n", " \n", "c = np.random.randint(0, 2, 10)\n", "print(c)" ], "metadata": { "collapsed": false } }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Element wise operations\n", "\n", "You can apply element-wise **arithmetic** and **logical** calculations to numpy arrays using arithmetic or logical operators. The functions np.**exp()**, np.**sqrt()**, or np.**log()** are other examples of functions that operate in the elements of a numpy array. You can check the entire list of available functions in the official [Numpy documentation]( https://numpy.org/doc/).\n", "**Some examples:**" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[10 12 14 16]\n", " [18 20 22 24]]\n", "[[8 8 8 8]\n", " [8 8 8 8]]\n", "[[3. 3.16227766 3.31662479 3.46410162]\n", " [3.60555128 3.74165739 3.87298335 4. ]]\n", "[[0. 0.69314718 1.09861229 1.38629436]\n", " [1.60943791 1.79175947 1.94591015 2.07944154]]\n", "[[ 1 4 9 16]\n", " [25 36 49 64]]\n", "[[ 6 7 8 9]\n", " [10 11 12 13]]\n" ] } ], "source": [ "x =np.array([[1,2,3,4],[5,6,7,8]])\n", "y =np.array([[9,10,11,12],[13,14,15,16]])\n", "print(x+y)\n", "print(y-x)\n", "print(np.sqrt(y))\n", "print(np.log(x))\n", "print(x**2)\n", "print(x+5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Note that in the last examples, we are adding a scalar value to a numpy array. In general, we can apply arithmetic operations on array of different dimensions, given that the smallest dimension between the operands is one, or that the arrays have the same dimensions. When this condition is met, numpy will expand the smaller array to match the shape of the larger array with an operation called **broadcasting**." ] }, { "cell_type": "markdown", "source": [ "## Indexing\n", "One of the main advantages of using Numpy is that it eases access to data in the array using a very convenient syntax which allows to access members in every dimension of the array using **indexing** and **slicing**. The syntax is:\n", "\n", " array_name[start:end:step, start:end:step, ...]\n", "\n", "That is, you can use the same syntax as with Python lists to access elements in the array in each dimension, using commas to separate in each dimension, for instance, take a look to the following examples:" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [ "x =np.array([[1, 2, 3, 4],\n", " [5, 6, 7, 8],\n", " [7, 8, 9, 10]])\n", "\n", "# Access all the elements of the first row\n", "print(\"first row:\")\n", "print(x[0, :])\n", "\n", "# Access all the elements of the first column\n", "print(\"first column:\")\n", "print(x[:, 0])\n", "\n", "# Access a 2x2 sub-matrix using slicing\n", "print(\"2x2 sub-matrix\")\n", "print(x[0:2, 1:3])\n" ], "metadata": { "collapsed": false } }, { "cell_type": "markdown", "source": [ "\n", "Note that the number of start:end:step pairs is equal to the number of dimensions of the array.\n", "Also, remember that the start and end values are optional and the default values are 0 and the size of the dimension, respectively. As a reminder:\n", "- The start value is optional and the default value is 0\n", "- The stop value is optional and the default value is the size of the dimension\n", "- The step value is optional and the default value is 1.\n", "- If the step value is negative, the array is traversed in reverse order.\n", "- If the start value is negative, it is assumed to be the size of the dimension minus the absolute value of the start value.\n", "- If the end value is negative, it is assumed to be the size of the dimension minus the absolute value of the end value." ], "metadata": { "collapsed": false } }, { "cell_type": "markdown", "source": [ "## CSV files with Numpy\n", "Luckily for us, Numpy provides methods to load data from a CSV file into an array and to write arrays to csv files.\n", "\n", "The function [loadtxt](https://numpy.org/doc/stable/reference/generated/numpy.loadtxt.html) allows to load data from CSV files in a numpy array, for example:" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [ "import numpy as np\n", "\n", "my_arr = np.loadtxt('exercise1.csv', delimiter=',', skiprows=1, usecols=(2, 3))\n", "print(my_arr.mean(axis=0))" ], "metadata": { "collapsed": false } }, { "cell_type": "markdown", "source": [ "In the example, we loaded the csv described above and indicated that the field delimiter is a comma using the named argument ```delimiter```. We also ignored the header using the ```skiprows``` named argument and specifying that we want to skip exactly one row. Finally, since we are only interested in the temperature and humidity readings (the only ones containing numerical values, we use the named argument ```usecols``` to only load data in columns 2 and 3 (yeah, you guessed it, column indexing starts in 0). The result is an array with two columns, so we can for instance calculate the mean temperature and humidity using the ```mean()``` method on axis 0 (rows).\n", "\n", "Equivalently, the function [savetxt](https://numpy.org/doc/stable/reference/generated/numpy.savetxt.html) saves the values of a Numpy array into a CSV file:" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [ "my_arr = np.arange(1,9)\n", "my_arr = my_arr.reshape((2,4))\n", "print(my_arr)\n", "np.savetxt('my_array.csv', my_arr, delimiter=\",\", fmt='%i')" ], "metadata": { "collapsed": false } }, { "cell_type": "markdown", "source": [ "This will create a csv file named 'my_array.csv' in the working directory, containing the contents of the ```my_arr``` Numpy array, using commas as field delimiter or separator, and formatting numbers as integers." ], "metadata": { "collapsed": false } } ], "metadata": { "interpreter": { "hash": "2db524e06e9f5f4ffedc911c917cb75e12dbc923643829bf417064a77eb14d37" }, "kernelspec": { "display_name": "Python 3.8.1 64-bit", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.1" } }, "nbformat": 4, "nbformat_minor": 2 }